[小ネタ]fluentdの起動でJob for td-agent.service failed because the control process exited with error code.エラーが発生した時の対処方法
こんにちは!AWSエンジニアのおつまみです!
先日fluentdでEC2のログをS3に出力する際に困った内容をシェアします!
困っていた内容
- fluentdの起動で
Job for td-agent.service failed because the control process exited with error code.
エラーが発生。 - fluentdが起動できず、S3にログが出力できなかった。
いきなり結論
td-agent --dry-run -c /etc/td-agent/td-agent.conf
コマンドで設定ファイルの構文エラーを確認しましょう。
背景
fluentdインストール後、設定ファイル/etc/td-agent/td-agent.conf
を以下のように設定しました。
# read apache logs <source> @type tail <parse> @type none </parse> path /var/log/httpd/access_log pos_file /var/log/td-agent/tmp/access.log.pos tag os.apache.access </source> # read secure logs <source> @type tail <parse> @type none </parse> path /var/log/secure pos_file /var/log/td-agent/tmp/access.log.pos tag os.secure </source> # send to S3 <match os.**> @type s3 s3_bucket yourbucketname path ${tag}/ s3_region ap-northeast-1 time_slice_format %Y/%m/%d s3_object_key_format %{path}%{time_slice}/%{index}.%{file_extension} <format> @type single_value </format> <buffer tag,time> @type file path /var/log/td-agent/s3/${tag} timekey 30 timekey_wait 30 timekey_use_utc true </buffer> </match>
その後、systemctl start td-agent.service
コマンドでサービスを起動しようとすると、以下のエラーが発生しました。
$ systemctl status td-agent.service Job for td-agent.service failed because the control process exited with error code. See "systemctl status td-agent.service" and "journalctl -xeu td-agent.service" for details.
エラーに表示されている通りのコマンドを実行しましたが、詳細な原因がわかりませんでした。
$ systemctl status td-agent.service × td-agent.service - td-agent: Fluentd based data collector for Treasure Data Loaded: loaded (/usr/lib/systemd/system/td-agent.service; enabled; preset: disabled) Active: failed (Result: exit-code) since Mon 2023-08-28 11:28:59 UTC; 35s ago Duration: 2h 5min 57.904s Docs: https://docs.treasuredata.com/display/public/PD/About+Treasure+Data%27s+Server-Side+Agent Process: 1935 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=1/FAILURE) CPU: 1.201s Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Control process exited, code=exited, status=1/FAILURE Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'. Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data. Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Scheduled restart job, restart counter is at 5. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Start request repeated too quickly. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
$ journalctl -xeu td-agent.service ░░ Subject: A start job for unit td-agent.service has failed ░░ Defined-By: systemd ░░ Support: https://access.redhat.com/support ░░ ░░ A start job for unit td-agent.service has finished with a failure. ░░ ░░ The job identifier is 3441 and the job result is failed. Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time. ░░ Subject: Resources consumed by unit runtime ░░ Defined-By: systemd ░░ Support: https://access.redhat.com/support ░░ ░░ The unit td-agent.service completed and consumed the indicated resources. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Scheduled restart job, restart counter is at 5. ░░ Subject: Automatic restarting of a unit has been scheduled ░░ Defined-By: systemd ░░ Support: https://access.redhat.com/support ░░ ░░ Automatic restarting of the unit td-agent.service has been scheduled, as the result for ░░ the configured Restart= setting for the unit. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data. ░░ Subject: A stop job for unit td-agent.service has finished ░░ Defined-By: systemd ░░ Support: https://access.redhat.com/support ░░ ░░ A stop job for unit td-agent.service has finished. ░░ ░░ The job identifier is 3526 and the job result is done. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time. ░░ Subject: Resources consumed by unit runtime ░░ Defined-By: systemd ░░ Support: https://access.redhat.com/support ░░ ░░ The unit td-agent.service completed and consumed the indicated resources. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Start request repeated too quickly. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'. ░░ Subject: Unit failed ░░ Defined-By: systemd ░░ Support: https://access.redhat.com/support ░░ ░░ The unit td-agent.service has entered the 'failed' state with result 'exit-code'. Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data. ░░ Subject: A start job for unit td-agent.service has failed ░░ Defined-By: systemd ░░ Support: https://access.redhat.com/support ░░ ░░ A start job for unit td-agent.service has finished with a failure. ░░ ░░ The job identifier is 3526 and the job result is failed.
対処方法
冒頭にも記載したとおり、fluentd --dry-run -c /etc/td-agent/td-agent.conf
コマンドで設定ファイルの構文エラーを確認します。
2023-08-28 12:05:21 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil 2023-08-28 12:05:21 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf" 2023-08-28 12:05:21 +0000 [info]: gem 'fluentd' version '1.16.1' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.1.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-kafka' version '0.19.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-sd-dns' version '0.1.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-td' version '1.2.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0' 2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0' 2023-08-28 12:05:21 +0000 [info]: starting fluentd-1.16.1 as dry run mode ruby="3.1.4" 2023-08-28 12:05:22 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error_class=Fluent::ConfigError error="Other 'in_tail' plugin already use same pos_file path: plugin_id = object:cd0, pos_file path = /var/log/td-agent/tmp/access.log.pos"
すると、一番下にerror
メッセージが表示されていました。
内容を確認すると、pos_file
で指定したpath
が重複していることが原因でした。
そのため、設定ファイルを修正します。
# read apache logs <source> @type tail <parse> @type none </parse> path /var/log/httpd/access_log pos_file /var/log/td-agent/tmp/access.log.pos tag os.apache.access </source> # read secure logs <source> @type tail <parse> @type none </parse> path /var/log/secure pos_file /var/log/td-agent/tmp/secure.log.pos tag os.secure </source> # send to S3 <match os.**> @type s3 s3_bucket yourbucketname path ${tag}/ s3_region ap-northeast-1 time_slice_format %Y/%m/%d s3_object_key_format %{path}%{time_slice}/%{index}.%{file_extension} <format> @type single_value </format> <buffer tag,time> @type file path /var/log/td-agent/s3/${tag} timekey 30 timekey_wait 30 timekey_use_utc true </buffer> </match>
再度、fluentd --dry-run -c /etc/td-agent/td-agent.conf
コマンドで設定ファイルの構文エラーを確認します。
2023-08-28 12:08:11 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil 2023-08-28 12:08:11 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf" 2023-08-28 12:08:11 +0000 [info]: gem 'fluentd' version '1.16.1' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.1.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-kafka' version '0.19.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-sd-dns' version '0.1.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-td' version '1.2.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0' 2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0' 2023-08-28 12:08:11 +0000 [info]: starting fluentd-1.16.1 as dry run mode ruby="3.1.4" 2023-08-28 12:08:12 +0000 [info]: using configuration file: <ROOT> <source> @type tail path "/var/log/httpd/access_log" pos_file "/var/log/td-agent/tmp/access.log.pos" tag "os.apache.access" <parse> @type "none" unmatched_lines </parse> </source> <source> @type tail path "/var/log/secure" pos_file "/var/log/td-agent/tmp/secure.log.pos" tag "os.secure" <parse> @type "none" unmatched_lines </parse> </source> <match os.**> @type s3 s3_bucket "td-agent-test-o2mami-bucket" path "${tag}/" s3_region "ap-northeast-1" time_slice_format %Y/%m/%d s3_object_key_format "%{path}%{time_slice}/%{index}.%{file_extension}" <format> @type "single_value" </format> <buffer tag,time> @type "file" path "/var/log/td-agent/s3/${tag}" timekey 30 timekey_wait 30 timekey_use_utc true </buffer> </match> </ROOT> 2023-08-28 12:08:12 +0000 [info]: finished dry run mode
error
メッセージが表示されず、設定ファイルが正しく修正されたようでした。
この状態でサービスを起動してみます。
$ systemctl start td-agent $ systemctl status td-agent.service ● td-agent.service - td-agent: Fluentd based data collector for Treasure Data Loaded: loaded (/usr/lib/systemd/system/td-agent.service; enabled; preset: disabled) Active: active (running) since Mon 2023-08-28 12:11:44 UTC; 6s ago Docs: https://docs.treasuredata.com/display/public/PD/About+Treasure+Data%27s+Server-Side+Agent Process: 2065 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 2071 (fluentd) Tasks: 8 (limit: 4421) Memory: 109.6M CPU: 2.481s CGroup: /system.slice/td-agent.service ├─2071 /opt/td-agent/bin/ruby /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid └─2074 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid --un> Aug 28 12:11:43 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data... Aug 28 12:11:44 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Started td-agent: Fluentd based data collector for Treasure Data.
無事、active
状態となり、S3にログが出力されました!
さいごに
今回はfluentdの起動に失敗した時の対処方法をお伝えしました。
最後までお読みいただきありがとうございました!
どなたかのお役に立てれば幸いです。
以上、おつまみ(@AWS11077)でした!